1,572 research outputs found

    Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data

    Get PDF
    Traditional convolutional layers extract features from patches of data by applying a non-linearity on an affine function of the input. We propose a model that enhances this feature extraction process for the case of sequential data, by feeding patches of the data into a recurrent neural network and using the outputs or hidden states of the recurrent units to compute the extracted features. By doing so, we exploit the fact that a window containing a few frames of the sequential data is a sequence itself and this additional structure might encapsulate valuable information. In addition, we allow for more steps of computation in the feature extraction process, which is potentially beneficial as an affine function followed by a non-linearity can result in too simple features. Using our convolutional recurrent layers we obtain an improvement in performance in two audio classification tasks, compared to traditional convolutional layers. Tensorflow code for the convolutional recurrent layers is publicly available in https://github.com/cruvadom/Convolutional-RNN

    The Many-to-Many Mapping Between the Concordance Correlation Coefficient and the Mean Square Error

    Full text link
    We derive the mapping between two of the most pervasive utility functions, the mean square error (MSEMSE) and the concordance correlation coefficient (CCC, ρc\rho_c). Despite its drawbacks, MSEMSE is one of the most popular performance metrics (and a loss function); along with lately ρc\rho_c in many of the sequence prediction challenges. Despite the ever-growing simultaneous usage, e.g., inter-rater agreement, assay validation, a mapping between the two metrics is missing, till date. While minimisation of LpL_p norm of the errors or of its positive powers (e.g., MSEMSE) is aimed at ρc\rho_c maximisation, we reason the often-witnessed ineffectiveness of this popular loss function with graphical illustrations. The discovered formula uncovers not only the counterintuitive revelation that `MSE1<MSE2MSE_1<MSE_2' does not imply `ρc1>ρc2\rho_{c_1}>\rho_{c_2}', but also provides the precise range for the ρc\rho_c metric for a given MSEMSE. We discover the conditions for ρc\rho_c optimisation for a given MSEMSE; and as a logical next step, for a given set of errors. We generalise and discover the conditions for any given LpL_p norm, for an even p. We present newly discovered, albeit apparent, mathematical paradoxes. The study inspires and anticipates a growing use of ρc\rho_c-inspired loss functions e.g., MSEσXY\left|\frac{MSE}{\sigma_{XY}}\right|, replacing the traditional LpL_p-norm loss functions in multivariate regressions.Comment: Why this discovery, or the mapping formulation is important: MSE1CCC2. In other words, MSE minimisation does not necessarily guarantee CCC maximisatio

    Scaling Speech Enhancement in Unseen Environments with Noise Embeddings

    Get PDF
    We address the problem of speech enhancement generalisation to unseen environments by performing two manipulations. First, we embed an additional recording from the environment alone, and use this embedding to alter activations in the main enhancement subnetwork. Second, we scale the number of noise environments present at training time to 16,784 different environments. Experiment results show that both manipulations reduce word error rates of a pretrained speech recognition system and improve enhancement quality according to a number of performance measures. Specifically, our best model reduces the word error rate from 34.04% on noisy speech to 15.46% on the enhanced speech. Enhanced audio samples can be found in https://speechenhancement.page.link/samples

    Calibrated Prediction Intervals for Neural Network Regressors

    Get PDF
    Ongoing developments in neural network models are continually advancing the state of the art in terms of system accuracy. However, the predicted labels should not be regarded as the only core output; also important is a well-calibrated estimate of the prediction uncertainty. Such estimates and their calibration are critical in many practical applications. Despite their obvious aforementioned advantage in relation to accuracy, contemporary neural networks can, generally, be regarded as poorly calibrated and as such do not produce reliable output probability estimates. Further, while post-processing calibration solutions can be found in the relevant literature, these tend to be for systems performing classification. In this regard, we herein present two novel methods for acquiring calibrated predictions intervals for neural network regressors: empirical calibration and temperature scaling. In experiments using different regression tasks from the audio and computer vision domains, we find that both our proposed methods are indeed capable of producing calibrated prediction intervals for neural network regressors with any desired confidence level, a finding that is consistent across all datasets and neural network architectures we experimented with. In addition, we derive an additional practical recommendation for producing more accurate calibrated prediction intervals. We release the source code implementing our proposed methods for computing calibrated predicted intervals. The code for computing calibrated predicted intervals is publicly available

    Fast Single-Class Classification and the Principle of Logit Separation

    Full text link
    We consider neural network training, in applications in which there are many possible classes, but at test-time, the task is a binary classification task of determining whether the given example belongs to a specific class, where the class of interest can be different each time the classifier is applied. For instance, this is the case for real-time image search. We define the Single Logit Classification (SLC) task: training the network so that at test-time, it would be possible to accurately identify whether the example belongs to a given class in a computationally efficient manner, based only on the output logit for this class. We propose a natural principle, the Principle of Logit Separation, as a guideline for choosing and designing losses suitable for the SLC. We show that the cross-entropy loss function is not aligned with the Principle of Logit Separation. In contrast, there are known loss functions, as well as novel batch loss functions that we propose, which are aligned with this principle. In total, we study seven loss functions. Our experiments show that indeed in almost all cases, losses that are aligned with the Principle of Logit Separation obtain at least 20% relative accuracy improvement in the SLC task compared to losses that are not aligned with it, and sometimes considerably more. Furthermore, we show that fast SLC does not cause any drop in binary classification accuracy, compared to standard classification in which all logits are computed, and yields a speedup which grows with the number of classes. For instance, we demonstrate a 10x speedup when the number of classes is 400,000. Tensorflow code for optimizing the new batch losses is publicly available at https://github.com/cruvadom/Logit Separation.Comment: Published as a conference paper in ICDM 201

    Editorial: IEEE Transactions on Affective Computing: Challenges and Chances

    Get PDF

    Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives

    Get PDF
    Over the past few years, adversarial training has become an extremely active research topic and has been successfully applied to various Artificial Intelligence (AI) domains. As a potentially crucial technique for the development of the next generation of emotional AI systems, we herein provide a comprehensive overview of the application of adversarial training to affective computing and sentiment analysis. Various representative adversarial training algorithms are explained and discussed accordingly, aimed at tackling diverse challenges associated with emotional AI systems. Further, we highlight a range of potential future research directions. We expect that this overview will help facilitate the development of adversarial training for affective computing and sentiment analysis in both the academic and industrial communities
    corecore